メインコンテンツへスキップ

時変深層フルランク空間相関分析 (TV Neural FCA)

This paper presents an unsupervised multichannel method that can separate moving sound sources based on an amortized variational inference (AVI) of joint separation and localization. A recently proposed blind source separation (BSS) method called neural full-rank spatial covariance analysis (FCA) trains a neural separation model based on a nonlinear generative model of multichannel mixtures and can precisely separate unseen mixture signals. This method, however, assumes that the sound sources hardly move, and thus its performance is easily degraded by the source movements. In this paper, we solve this problem by introducing time-varying spatial covariance matrices and directions of arrival of sources into the nonlinear generative model of the neural FCA. This generative model is used for training a neural network to jointly separate and localize moving sources by using only multichannel mixture signals and array geometries. The training objective is derived as a lower bound on the log-marginal posterior probability in the framework of AVI. Experimental results obtained with mixture signals of moving sources show that our method outperformed an existing joint separation and localization method and standard BSS methods.
Time-varying neural FCA overview
Joint separation and localization with time-varying parameters.
Title Joint Separation and Localization of Moving Sound Sources Based on Neural Full-Rank Spatial Covariance Analysis
Authors Hokuto Munakata, Yoshiaki Bando, Ryu Takeda, Kazunori Komatani, Masaki Onishi
Journal IEEE Signal Processing Letters (2023)
Contents PDF (OA on IEEE Xplore)

Separation results for real recordings #

We separated 6-channel mixture signals recorded in our experimental room with the neural network evaluated in the paper. The mixture signals were dereverberated by the weighted prediction error (WPE) method in advance.

Result 1: Separation of two static sources #

Input
Dereverberation results

Src 1 (static): This is a demonstration of time-varying neural FCA.

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src 2 (static): これは時変深層フルランク空間相関分析のデモ動画です.
(Kore wa jihen sinsou furu-ranku kukan-soukan bunseki no demo douga desu)

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Result 2: Separation of one static source and one moving source #

Input
Dereverberation results

Src 1 (moving): This is a demonstration of time-varying neural FCA.

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src 2 (static): これは時変深層フルランク空間相関分析のデモ動画です.
(Kore wa jihen sinsou furu-ranku kukan-soukan bunseki no demo douga desu)

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Result 3: Separation of two moving sources #

Input
Dereverberation results

Src. 1 (moving)

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src. 2 (moving)

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Simulated Mixtures #

Simulated trajectories provide additional insight into DOA-aware separation.

Scene 1: 053a050b vs 051o020g #

Input

Src. 1

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src. 2

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Scene 2: 22ha010r vs 22ga010e #

Input

Src. 1

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src. 2

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Scene 3: 444c0209 vs 445c0209 #

Input

Src. 1

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA

Src. 2

cACGMM
FCA
FastMNMF2
DoA-HMM clustering
Neural FCA
TV Neural FCA